Search CORE

167 research outputs found

Utilisation de la langue naturelle pour l'interrogation de documents structurés

Author: Girardot Jean-Jacques
Juganaru-Mathieu Mihaela
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 09/03/2005
Field of study

http://www.asso-aria.org/coria/2005/19.pdfInternational audienceLe langage de requête est l'indispensable interface entre l'utilisateur et l'outil de recherche. Simplifié au maximum dans les cas où les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse à des documents structurés et qu'il s'a git de définir des contraintes portant à la fois sur la structure et le contenu. L'approche ici- décrite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requêtes. L'article décrit dans un premier temps les différentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requête en langage naturel en une représentation sémantique indépendante du contexte. Des règles de simplification adaptées à la structure et au domaine du corpus sont ensuite appliquées, permettant d'obtenir une forme finale, adaptée à une conversion ver s un langage de requête formel. L'article décrit enfin les expérimentations effectuées et tir e les premières conclusions sur divers aspects de cette approche

HAL-uB

HAL - Université de Franche-Comté

HAL-EMSE

Justification of Answers by Verification of Dependency Relations-The French AVE Task.

Author: Grappy Arnaud
Grau Brigitte
Moriceau Véronique
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/09/2008
Field of study

International audienceThis paper presents LIMSI results in Answer Validation Exercise (AVE) 2008 for French. We tested two approaches during this campaign: a syntax-based strategy and a machine learning strategy. Results of both approaches are presented and discussed

Supervised Machine Learning Techniques to Detect TimeML Events in French and English

Author: Arnulphy Béatrice
Claveau Vincent
Tannier Xavier
Vilnat Anne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2015
Field of study

International audienceIdentifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years; yet, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, language modeling and k-nearest-neighbors. These systems are evaluated on French corpora and compared with state-of-the-art methods on English. The very good results obtained on both languages validate our whole approach

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Question Generation for French: Collating Parsers and Paraphrasing Questions

Author: Bernhard Delphine
Moriceau Véronique
Tannier Xavier
Viron Louis de
Publication venue: University of Illinois at Chicago Library
Publication date: 01/01/2012
Field of study

This article describes a question generation system for French. The transformation of declarative sentences into questions relies on two different syntactic parsers and named entity recognition tools. This makes it possible to further diversify the questions generated and to possibly alleviate the problems inherent to the analysis tools. The system also generates reformulations for the questions based on variations in the question words, inducing answers with different granularities, and nominalisations of action verbs. We evaluate the questions generated for sentences extracted from two different corpora: a corpus of newspaper articles used for the CLEF Question Answering evaluation campaign and a corpus of simplified online encyclopedia articles. The evaluation shows that the system is able to generate a majority of good and medium quality questions. We also present an original evaluation of the question generation system using the question analysis module of a question answering system

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Évaluation de la contextualisation de tweets

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceCet article s'intéresse à l'évaluation de la contextualisation de tweets. La contextualisation est définie comme un résumé permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l'ensemble des éléments qui permettent à un lecteur de comprendre tout ou partie de son contenu. Nous définissons un cadre d'évaluation pour la contextualisation de tweets généralisable à d'autres textes courts. Nous proposons une collection de référence ainsi que des mesures d'évaluation adhoc. Ce cadre d'évaluation a été expérimenté avec succès dans la contexte de la campagne INEX Tweet Contextualization. Au regard des résultats obtenus lors de cette campagne, nous discutons ici les mesures utilisées en lien avec les autres mesures de la littérature

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

Overview of INEX Tweet Contextualization 2013 track

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceTwitter is increasingly used for on-line client and audience fishing; this motivated the tweet contextualization task at INEX. The objective is to help a user to understand a tweet by providing him with a short summary (500 words). This summary should be built automatically using local resources like the Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. The task is evaluated considering informativeness which is computed using a variant of Kullback-Leibler divergence and passage pooling. Meanwhile effective readability in context of summaries is checked using binary questionnaires on small samples of results. Running since 2010, results show that only systems that efficiently combine passage retrieval, sentence segmentation and scoring, named entity recognition, text POS analysis, anaphora detection, diversity content measure as well as sentence reordering are effective

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

Overview of INEX Tweet Contextualization 2014 track

Author: Bellot Patrice
Moriceau Véronique
Mothe Josiane
Sanjuan Eric
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audience140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document sum- marization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

Open Archive Toulouse Archive Ouverte

Impact of translation on biomedical information extraction from real-life clinical notes

Author: Carrat Fabrice
Gérardin Christel
Tannier Xavier
Wajsbürt Perceval
Xiong Yuhan
Publication venue
Publication date: 03/06/2023
Field of study

The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes. We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the first translation step, we compare a two-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms. Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested. In conclusion, despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.Comment: 26 pages, 2 figures, 5 table

arXiv.org e-Print Archive

Good practices for clinical data warehouse implementation: a case study in France

Author: Degremont Adeline
Doutreligne Matthieu
Jachiet Pierre-Alain
Lamer Antoine
Tannier Xavier
Publication venue
Publication date: 06/02/2023
Field of study

Real World Data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern Clinical Data Warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multi-level governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multi-centric data reuses as well as innovations in routine care.Comment: 16 page

arXiv.org e-Print Archive

Directory of Open Access Journals

Utilisation de la syntaxe pour valider les réponses à des questions par plusieurs documents.

Author: Grau Brigitte
Moriceau Véronique
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 01/04/2009
Field of study

National audienceCet article présente FIDJI, un système de questions-réponses pour le français, combinant des informations syntaxiques sur la question et les documents avec des techniques plus traditionnelles du domaine, telles que la reconnaissance des entités nommées et la pondération des termes. Nous expérimentons notament dans ce système la validation des réponses dans plusieurs documents, ainsi que des techniques spécifiques permettant de répondre à différents types de questions (comme les questions attendant des réponses multiples (liste) ou une réponse booléenne)